Probability Basics

Joint distribution

The joint distribution of two random variables X and Y:

p(x,y)=p(X=x,Y=y)

Marginal distribution

The Marginal distribution of X is summing over all possible states of Y:

p(X=x)=yp(X=x,Y=y)

This is also called Sum rule / The rule of total probability.

Conditional joint

p(X,Y|Z)=p(X|Z)p(Y|Z)

Product rule

p(A,B)=p(A|B)p(B)=p(B|A)p(A)

Chain rule

p(A1,A2,A3,...,An)=p(A1)×p(A2|A1)×p(A3|A1,A2)×...×p(An|A1,A2,...,An1)

Note that Markov chain can be regarded as a simplified version of the chain rule, i.e. predict by ignoring all previous events but the last one(s), so that p(An|A1,..An1)p(An|An1)

p(A|B,C)p(B|C)=p(B|A,C)p(A|C)

derivation:
$$
p(A|B, C) = p(A, B, C) / p(B, C)
$$
with
$$
p(A, B, C) = p(B|A, C) p(A|C) p(C), \ \ p(B, C) = p(B|C)p(C)